{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tune a model via grid search\n",
    "\n",
    "```{note}\n",
    "The tutorial is for FuxiCTR v1.2.\n",
    "```\n",
    "\n",
    "This tutorial shows how to tune model hyper-parameters via grid search over the specified tuning space.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "scrolled": false
   },
   "source": [
    "We provide a useful tool script `run_param_tuner.py` to tune FuxiCTR models based on YAML config files.\n",
    "\n",
    "+ --config: The config file that defines the tuning space\n",
    "+ --gpu: The available gpus for parameters tuning and multiple gpus can be used (e.g., using --gpu 0 1 for two gpus)\n",
    "+ --tag: (optional) Specify the tag to determine which expid to run (e.g. 001 for the first expid). This is useful to rerun one specific experiment_id that contains the tag.\n",
    "\n",
    "In the following example, we use the hyper-parameters of `FM_test` in [./config](https://github.com/xue-pai/FuxiCTR/tree/v1.2.0/config) as the base setting, and create a tuner config file `FM_tuner_config.yaml` in [benchmarks/tuner_config](https://github.com/xue-pai/FuxiCTR/tree/v1.2.0/benchmarks/tuner_config), which defines the tuning space for parameter tuning. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# FM_tuner_config.yaml\n",
    "base_config: ../config/ # the location of base config\n",
    "base_expid: FM_test # the expid of default hyper-parameters\n",
    "dataset_id: taobao_tiny # the dataset_id used, which overwrites the dataset_id key in FM_test\n",
    "\n",
    "tuner_space:\n",
    "    model_root: './tuner_config/' # the value will override the default value in FM_test\n",
    "    embedding_dim: [16, 32] # the values in the list will be grid-searched\n",
    "    regularizer: [0, 1.e-6, 1.e-5] # the values in the list will be grid-searched\n",
    "    learning_rate: 1.e-3 # it is equivalent to [1.e-3]\n",
    "    batch_size: 128 # the value will override the default value in FM_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Specifically, if a key in `tuner_space` has values stored in a list, those values will be grid-searched. Otherwise, the default value in `FM_test` will be applied.\n",
    "\n",
    "Run the following command to start:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd benchmarks\n",
    "!python run_param_tuner.py --config ./tuner_config/FM_tuner_config.yaml --gpu 0 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After finished, all the searched results can be accessed from `FM_tuner_config.csv` in the `./benchmarks` folder.\n",
    "\n",
    "Note that if you want to run only one group of hyper-parameters in the search space, you can use `--tag` to specify which one to run. In the following example, 001 means the expid (i.e., FM_test_001_7f7f3b34) corresponding to the first group of hyper-parameters. It is useful when one needs to rerun an expid for reproduction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd benchmarks\n",
    "!python run_param_tuner.py --config ./tuner_config/FM_tuner_config.yaml --tag 001 --gpu 0 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "While the above example config file shows how to import base_expid and dataset_id from the base_config folder, it is also flexible to directly expand the base setting in the tunner config file. Both configurations are the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This example load base_expid and dataset_id from the same file\n",
    "base_expid: FM_test # the expid of default hyper-parameters\n",
    "dataset_id: taobao_tiny # the dataset_id used, which overwrites the dataset_id key in FM_test\n",
    "\n",
    "model_config:\n",
    "    FM_test:\n",
    "        model_root: '../checkpoints/'\n",
    "        workers: 3\n",
    "        verbose: 1\n",
    "        patience: 2\n",
    "        pickle_feature_encoder: True\n",
    "        use_hdf5: True\n",
    "        save_best_only: True\n",
    "        every_x_epochs: 1\n",
    "        debug: False\n",
    "        model: FM\n",
    "        dataset_id: taobao_tiny_data\n",
    "        loss: binary_crossentropy\n",
    "        metrics: ['logloss', 'AUC']\n",
    "        task: binary_classification\n",
    "        optimizer: adam\n",
    "        learning_rate: 1.0e-3\n",
    "        regularizer: 1.e-8\n",
    "        batch_size: 128\n",
    "        embedding_dim: 4\n",
    "        epochs: 1\n",
    "        shuffle: True\n",
    "        seed: 2019\n",
    "        monitor: 'AUC'\n",
    "        monitor_mode: 'max'\n",
    "    \n",
    "dataset_config:\n",
    "    taobao_tiny:\n",
    "        data_root: ../data/\n",
    "        data_format: csv\n",
    "        train_data: ../data/tiny_data/train_sample.csv\n",
    "        valid_data: ../data/tiny_data/valid_sample.csv\n",
    "        test_data: ../data/tiny_data/test_sample.csv\n",
    "        min_categr_count: 1\n",
    "        feature_cols:\n",
    "            - {name: [\"userid\",\"adgroup_id\",\"pid\",\"cate_id\",\"campaign_id\",\"customer\",\"brand\",\"cms_segid\",\n",
    "                      \"cms_group_id\",\"final_gender_code\",\"age_level\",\"pvalue_level\",\"shopping_level\",\"occupation\"], \n",
    "                      active: True, dtype: str, type: categorical}\n",
    "        label_col: {name: clk, dtype: float}\n",
    "\n",
    "tuner_space:\n",
    "    model_root: './tuner_config/' # the value will override the default value in FM_test\n",
    "    embedding_dim: [16, 32] # the values in the list will be grid-searched\n",
    "    regularizer: [0, 1.e-6, 1.e-5] # the values in the list will be grid-searched\n",
    "    learning_rate: 1.e-3 # it is equivalent to [1.e-3]\n",
    "    batch_size: 128 # the value will override the default value in FM_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In what follows, we show a real example to tune FM model on Criteo_x4. Criteo_x4 is a reusable dataset split of the widely-used Criteo data, which can be obtained from the [BARS benchmark](https://github.com/openbenchmark/BARS/tree/master/ctr_prediction/datasets/Criteo#Criteo_x4). After downloading, you can put the csv data into `data/Criteo/Criteo_x4`.\n",
    "\n",
    "Then, you could save the following tunner config into the file `./benchmarks/tuner_config/FM_criteo_x4_tuner_config_02.yaml`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# FM_criteo_x4_tuner_config_02.yaml\n",
    "base_config: ../config/\n",
    "base_expid: FM_base\n",
    "dataset_id: criteo_x4\n",
    "\n",
    "dataset_config:\n",
    "    criteo_x4:\n",
    "        data_root: ../data/Criteo/\n",
    "        data_format: csv\n",
    "        train_data: ../data/Criteo/Criteo_x4/train.csv\n",
    "        valid_data: ../data/Criteo/Criteo_x4/valid.csv\n",
    "        test_data: ../data/Criteo/Criteo_x4/test.csv\n",
    "        min_categr_count: 10\n",
    "        feature_cols:\n",
    "            - {name: [I1,I2,I3,I4,I5,I6,I7,I8,I9,I10,I11,I12,I13],\n",
    "               active: True, dtype: float, type: categorical, preprocess: convert_to_bucket, na_value: 0}\n",
    "            - {name: [C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12,C13,C14,C15,C16,C17,C18,C19,C20,C21,C22,C23,C24,C25,C26],\n",
    "               active: True, dtype: str, type: categorical, na_value: \"\"}\n",
    "        label_col: {name: Label, dtype: float}\n",
    "\n",
    "tuner_space:\n",
    "    model_root: './Criteo/FM_criteo_x4_001/'\n",
    "    embedding_dim: 16\n",
    "    regularizer: [0, 1.e-6, 1.e-5, 1.e-4]\n",
    "    learning_rate: 1.e-3\n",
    "    batch_size: 10000\n",
    "    seed: 2019"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the following command, you can obtain the experimental results in `FM_criteo_x4_tuner_config_02.csv`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd benchmarks\n",
    "!nohup python run_param_tuner.py --config ./tuner_config/FM_criteo_x4_tuner_config_02.yaml --gpu 0 1 > run.log &"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| time | reproducing command | expid | dataset_id | train | validation | test | \n",
    "|----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------|-------------------------------|------------|---------------------------------------|----------------------------------------|\n",
    "|20210831-232555 |[command] python run_expid.py --version pytorch --config Criteo/FM_criteo_x4_001/FM_criteo_x4_tuner_config_02 --expid FM_criteo_x4_001_df584306 --gpu 0|[exp_id] FM_criteo_x4_001_df584306|[dataset_id] criteo_x4_9ea3bdfc|[train] N.A.|[val] logloss: 0.449649 - AUC: 0.803118|[test] logloss: 0.449396 - AUC: 0.803503|\n",
    "| 20210901-024437|[command] python run_expid.py --version pytorch --config Criteo/FM_criteo_x4_001/FM_criteo_x4_tuner_config_02 --expid FM_criteo_x4_002_4661e593 --gpu 0|[exp_id] FM_criteo_x4_002_4661e593|[dataset_id] criteo_x4_9ea3bdfc|[train] N.A.|[val] logloss: 0.444749 - AUC: 0.807179|[test] logloss: 0.444486 - AUC: 0.807512|\n",
    "| 20210901-115913|[command] python run_expid.py --version pytorch --config Criteo/FM_criteo_x4_001/FM_criteo_x4_tuner_config_02 --expid FM_criteo_x4_003_3da0082a --gpu 0|[exp_id] FM_criteo_x4_003_3da0082a|[dataset_id] criteo_x4_9ea3bdfc|[train] N.A.|[val] logloss: 0.443421 - AUC: 0.808198|[test] logloss: 0.443109 - AUC: 0.808607|\n",
    "| 20210902-091353|[command] python run_expid.py --version pytorch --config Criteo/FM_criteo_x4_001/FM_criteo_x4_tuner_config_02 --expid FM_criteo_x4_004_3402a9bc --gpu 0|[exp_id] FM_criteo_x4_004_3402a9bc|[dataset_id] criteo_x4_9ea3bdfc|[train] N.A.|[val] logloss: 0.449190 - AUC: 0.801985|[test] logloss: 0.448863 - AUC: 0.802439|\n",
    "\n",
    "\n",
    ".\n",
    "\n",
    "For more running examples, please refer to the [BARS-CTR-Prediction](https://openbenchmark.github.io/ctr-prediction) benchmark."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
